The aim of the analysis is to identify phosphorylation sites that are regulated upon antibody-based TCR activation in primary T-cells. A global analysis of our proteomic/phosphoproteomic data sets should allow us to study phospho-signalling resulting from TCR activation in primary cells.
Romain Roncagalli performed cell isolation and stimulation 4 times (R1: 20141204, R3: 20150306, R4: 20150713, R5: 20151103) as follows:
Day 0:
Day 2:
Day 3:
Add 5 ml of complete medium with IL-2/ well
Day 4:
After trypsin digestion, 5ug of each peptide sample was injected on the Q-Exactive plus for relative protein quantification.
Carine Froment performed TiO2 enrichments, injected 3 to 4 times 2.5% of them (labelled “TiO2”) and performed a phospho-tyrosine IP on the 90% remaining (injected in 3 to 4 technical replicates - corresponds to 22.5% of starting material). Samples were injected onto a Q-Exactive Plus for label-free MS relative quantification. The analysis was performed using Maxquant.
The input files are all in the folder RAW/:
PeptideSamples/: output protein table from the MaxQuant analysis of all peptide samples prior to TiO2 and phospho-tyrosine enrichment.TiO2/: output phosphosites tables from the MaxQuant analysis of 10% of the samples after TiO2 and before phospho-tyrosine enrichment. Each biological replicate was searched independently.pYIP/: output phosphosites tables from the MaxQuant analysis of the samples after TiO2 and phospho-tyrosine enrichment. Each biological replicate was searched independently.MaxQuant returns a table with one row per phosphorylation site. It returns the quantification values for the multiply phosphorylated site but does not map them to the corresponding pairs (or triplets). I use a script to identify the quantification values comming from the multiply phosphorylated peptides and match them.
I keep only the sites with a PEP value <= 0.01.
I correct some issues with protein IDs:
I create a list with for each phosphorylation site ID its ID with gene name (for figures) and the list of all the mono-phosphorylations it corresponds to.
In order to correct for technical variation due to the instrument with spided-in synthetic peptides. I use their intensity to normalise the data. I keep only the iRTs with a CV < 50%, and take as a reference the median of the runs (iRT signal) in the 4th biological repeat. This is to keep the values close to the ones of the NS in the R4, where there was an issue with the spike and I can’t perform normalisation.
## [1] 1625700000
The signal is very low for the iRTs in the TiO2_R4 unstimulated. I decide to not normalise these runs to avoid creating a bias with very low normalisation factors…
For the replicate 1 (R1) of the TiO2, we noticed that the injection 1 were not ran in optimum conditions. There are more than 50% missing values in these runs. We remove them.
I also remove the R1_S120.Inj3bis that has a lower intensity than the others, which would impact the statistical analysis. And the RR_ProFI_S4_R4_CCF01414_EpTyr_NS.Inj3.
There are 10047 unique sites or combination of sites (from multiply phosphorylated peptides) in the whole data set. These correspond to 9560 individual phosphorylation sites.
## [1] "Number of unique site in the analysis:"
## [1] 10047
## [1] "Number of unique site in the pYIP:"
## [1] 560
## [1] "Number of unique site in the TiO2:"
## [1] 9702
## quartz_off_screen
## 2
From now on, I will differenciate phosphorylation sites that have a localisation score >75% (Class 1), >0.5 and under or equal to 0.75 (Class 2) and >0.25 and under or equal to 0.5 (Class 3).
The multiply-phosphorylated sites do not get a score. For sites identified multiple times, I keep the highest localisation score for the pie chart.
| Var1 | value | Sample | |
|---|---|---|---|
| 1 | Class 1 | 5398 | TiO2 |
| 5 | Class 1 | 377 | pYIP |
| 2 | Class 2 | 1652 | TiO2 |
| 6 | Class 2 | 70 | pYIP |
| 3 | Class 3 | 1904 | TiO2 |
| 7 | Class 3 | 67 | pYIP |
| 4 | LowLocScore | 280 | TiO2 |
| 8 | LowLocScore | 19 | pYIP |
## quartz_off_screen
## 2
For the monophosphorylations, I keep the phosphosites Class 1 and Class 2.
After this filter, there are 7810 unique phosphosites.
Number of phosphorylated amino-acids identified in the study.
Total number of pS, pT, pY identified: 7418, 1793, 349. These are present on 2331 proteins.
## [1] "Number of identified phosphorylated amino acid:"
## $TiO2
## TiO2_R1 TiO2_R3 TiO2_R4 TiO2_R5
## S 4690 4995 5395 5817
## T 716 824 919 1117
## Y 51 60 71 116
## total 5457 5879 6385 7050
##
## $pYIP
## pYIP_R1 pYIP_R3 pYIP_R5
## S 133 138 140
## T 37 43 45
## Y 236 262 262
## total 406 443 447
## [1] "Number of identified phosphorylated amino acid that are not reported in the mouse PhosphoSitePlus data:"
## $TiO2
## TiO2_R1 TiO2_R3 TiO2_R4 TiO2_R5
## S 454 493 570 700
## T 141 171 191 270
## Y 13 15 18 39
##
## $pYIP
## pYIP_R1 pYIP_R4 pYIP_R5
## S 13 14 14
## T 14 15 16
## Y 33 38 38
Number of each amino-acid in the table:
## quartz_off_screen
## 2
## quartz_off_screen
## 2
I think that the high number of S/T that we find in the pYIP can be explained by the contiguous sites in multiply-phosphorylated peptides.
For the monophosphorylations, I keep the phosphosites Class 1 and Class 2 before merging the tables. I log2-transform the data set.
Pairwise plot with log2 values observed per condition per biological replicate:
## null device
## 1
I make a figure with correlation between all runs for the supplementary data of the paper:
## Warning: Removed 3230 rows containing non-finite values (stat_bin).
## Warning: Removed 3230 rows containing non-finite values (stat_bin).
## quartz_off_screen
## 2
## Warning: Removed 5604 rows containing non-finite values (stat_bin).
## Warning: Removed 5604 rows containing non-finite values (stat_bin).
## quartz_off_screen
## 2
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.4
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] corrplot_0.84 knitr_1.21 gplots_3.0.1.1 ggplot2_3.1.0
## [5] reshape2_1.4.3 venneuler_1.1-0 rJava_0.9-10
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.0 highr_0.7 pillar_1.3.1
## [4] compiler_3.5.2 plyr_1.8.4 bindr_0.1.1
## [7] bitops_1.0-6 tools_3.5.2 digest_0.6.18
## [10] evaluate_0.12 tibble_2.0.1 gtable_0.2.0
## [13] pkgconfig_2.0.2 rlang_0.3.1 yaml_2.2.0
## [16] xfun_0.4 bindrcpp_0.2.2 withr_2.1.2
## [19] stringr_1.3.1 dplyr_0.7.8 caTools_1.17.1.1
## [22] gtools_3.8.1 grid_3.5.2 tidyselect_0.2.5
## [25] glue_1.3.0 R6_2.3.0 rmarkdown_1.11
## [28] gdata_2.18.0 purrr_0.2.5 magrittr_1.5
## [31] scales_1.0.0 htmltools_0.3.6 assertthat_0.2.0
## [34] colorspace_1.4-0 labeling_0.3 KernSmooth_2.23-15
## [37] stringi_1.2.4 lazyeval_0.2.1 munsell_0.5.0
## [40] crayon_1.3.4